Brief Description

The outline of this project are shown below.

Inverting the Generator (30 pts)
Interpolate your Cats (10 pts)
Scribble to Image (40 pts)
EC: Stable Diffusion (10pts)
EC: High-res Grumpy Cat (2pts)
EC: Afhqcat Dataset (2pt)

1. Content Reconstruction (30 pts)

Hyperparameter Tuning

Ablation on model (vanilla, stylegan)

I selected the StyleGAN model because its adaptive instance normalization (AdaIN) layers and improved network structures enable it to generate images with realistic and diverse features. In comparison to vanilla GAN, StyleGAN exhibits a superior ability to produce high-quality images.

Ablation on latent (z, w, w+)

I selected the w+ latent space as it represents the intermediate latent space, which provides finer-grained control over the generated images.

Ablation on loss_type (l1, l2)

I opted for l1 as the loss_type. When compared to the l2 loss, l1 loss treats small and large errors equally. This means that l1 loss incentivizes the network to generate images with sharper edges and more pronounced features.

Ablation on perc_wgt (0, 0.1, 0.01)

I selected a perc_wgt value of 0.01. A higher perceptual loss weight, such as perc_wgt = 0.1, tends to cause the generator to focus excessively on the reference image, leading to outputs that lack creativity. Conversely, using no perceptual loss (e.g., perc_wgt = 0) results in perceptually unrealistic and unappealing outputs.

Run time for Ablations

Using a single RTX 3090 GPU, the vanilla GAN took 8.711 seconds to run, while StyleGAN's runtime ranged from 25.298 to 26.602 seconds, depending on the selected hyperparameters.

Visual Results

Now we have done the hyperparameter tuning, let's look at some visual results.

2. Interpolate your Cats (10 pts)

Visual Results

The outcomes of the interpolated gif experiments are presented below. The resulting gifs exhibit exceptional visual appeal, realism, and coherence.

3. Scribble to Image (40 pts)

Visual Results

Below, I present some results for the scribble to image task. While the output images resemble cats, they suffer from issues such as distortions, artifacts, and excessive use of blues. This task is challenging due to the difficulty of interpreting incomplete and ambiguous hand-drawn sketches and translating them into coherent images.

4. EC: Stable Diffusion (10pts)

I utilized stable diffusion to generate a set of images based on text prompts, and the ensuing outcomes are presented below.

Visual Results

"A black cat scribble with a big smile"

"A brown cat scribble with an very angry looking"

"A white cat scribble with a big head and a curious looking"

5. EC: High-res Grumpy Cat (2pts)

Visual Results 128

I conducted image generation experiments on grumpy cat images with a resolution of 128 X 128, and the resulting images are displayed below.

Visual Results 256

Firstly, I conducted image generation experiments on grumpy cat images with a resolution of 256 X 256, and the ensuing results are presented below.

6. EC: Afhqcat Dataset (2pt)

Visual Results

Additionally, I conducted image generation experiments on the Afhqcat dataset, which I present in the following results. It is worth noting that generating high-quality images on the Afhqcat dataset, which has a resolution of 512 X 512, is an challenging task.